Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve df to records performance #28512

Merged
merged 1 commit into from
May 15, 2024

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented May 15, 2024

SUMMARY

Leverages to_dict https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html from pandas.
To improve speed.

Simple benchmark results:

Time taken with old (rows 10000): 0.024200916290283203
Time taken with new (rows 10000): 0.006799936294555664
Percentage improvement: 71.90%
Time taken with old (rows 20000): 0.020637035369873047
Time taken with new (rows 20000): 0.01266789436340332
Percentage improvement: 38.62%
Time taken with old (rows 30000): 0.030869007110595703
Time taken with new (rows 30000): 0.01958012580871582
Percentage improvement: 36.57%
Time taken with old (rows 40000): 0.04102921485900879
Time taken with new (rows 40000): 0.025703907012939453
Percentage improvement: 37.35%
Time taken with old (rows 50000): 0.052468061447143555
Time taken with new (rows 50000): 0.03229331970214844
Percentage improvement: 38.45%
Time taken with old (rows 60000): 0.06525206565856934
Time taken with new (rows 60000): 0.04095005989074707
Percentage improvement: 37.24%
Time taken with old (rows 70000): 0.07497382164001465
Time taken with new (rows 70000): 0.04742288589477539
Percentage improvement: 36.75%
Time taken with old (rows 80000): 0.0855870246887207
Time taken with new (rows 80000): 0.05132102966308594
Percentage improvement: 40.04%
Time taken with old (rows 90000): 0.0931100845336914
Time taken with new (rows 90000): 0.05876302719116211
Percentage improvement: 36.89%
Time taken with old (rows 100000): 0.10411715507507324
Time taken with new (rows 100000): 0.06470870971679688
Percentage improvement: 37.85%
Time taken with old (rows 110000): 0.11521410942077637
Time taken with new (rows 110000): 0.07151579856872559
Percentage improvement: 37.93%
Time taken with old (rows 120000): 0.12491703033447266
Time taken with new (rows 120000): 0.07925105094909668
Percentage improvement: 36.56%
Time taken with old (rows 130000): 0.13448190689086914
Time taken with new (rows 130000): 0.08515024185180664
Percentage improvement: 36.68%
Time taken with old (rows 140000): 0.1515657901763916
Time taken with new (rows 140000): 0.09524393081665039
Percentage improvement: 37.16%
Time taken with old (rows 150000): 0.15466761589050293
Time taken with new (rows 150000): 0.09760618209838867
Percentage improvement: 36.89%
Time taken with old (rows 160000): 0.1649610996246338
Time taken with new (rows 160000): 0.10681009292602539
Percentage improvement: 35.25%
Time taken with old (rows 170000): 0.1754932403564453
Time taken with new (rows 170000): 0.1124107837677002
Percentage improvement: 35.95%
Time taken with old (rows 180000): 0.18709874153137207
Time taken with new (rows 180000): 0.12268805503845215
Percentage improvement: 34.43%
Time taken with old (rows 190000): 0.2039201259613037
Time taken with new (rows 190000): 0.12427520751953125
Percentage improvement: 39.06%

Memory profiles look the same, used 1M rows

Screenshot 2024-05-15 at 14 45 34 Screenshot 2024-05-15 at 14 46 18

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.62%. Comparing base (76d897e) to head (4cba111).
Report is 116 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28512       +/-   ##
===========================================
+ Coverage   60.48%   77.62%   +17.13%     
===========================================
  Files        1931      521     -1410     
  Lines       76236    37436    -38800     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    29060    -17054     
+ Misses      28017     8376    -19641     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive ?
javascript ?
mysql 77.18% <100.00%> (?)
postgres 77.29% <100.00%> (?)
presto 53.67% <100.00%> (-0.14%) ⬇️
python 77.62% <100.00%> (+14.13%) ⬆️
sqlite 76.73% <100.00%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dpgaspar dpgaspar marked this pull request as ready for review May 15, 2024 13:52
@dpgaspar dpgaspar requested a review from craig-rueda May 15, 2024 13:53
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dpgaspar dpgaspar merged commit 11164e2 into apache:master May 15, 2024
62 of 63 checks passed
@dpgaspar dpgaspar deleted the fix/df-to-recored-efficiency branch May 15, 2024 14:30
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
@michael-s-molina michael-s-molina added the v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch label May 23, 2024
michael-s-molina pushed a commit that referenced this pull request May 30, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request May 31, 2024
@mistercrunch mistercrunch added 🍒 4.0.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch 🍒 4.0.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants